Add PostgreSQL support and restore related configurations by romeokienzler · Pull Request #59 · terrastackai/iterate

romeokienzler · 2026-04-24T08:38:47Z

This pull request introduces a plugin-based architecture for Optuna coordinator (storage backend) selection in the iterate2 module, enabling flexible and extensible support for different storage backends such as SQLite, JournalFS, and PostgreSQL. It also adds first-class support for PostgreSQL as a coordinator backend, improves HPO configuration ergonomics, and provides example scripts and dependency management for PostgreSQL-based workflows.

Key changes include:

1. Coordinator Plugin System and PostgreSQL Support

Refactored storage backend selection in iterate2 to use a new plugin registry (CoordinatorPlugin), allowing easy addition of new coordinator backends. (terratorch_iterate/iterate2/plugin/coordinator/__init__.py, terratorch_iterate/iterate2/_iterate2.py, terratorch_iterate/iterate2/__init__.py, terratorch_iterate/iterate2/plugin/__init__.py) [1] [2] [3] [4]
Implemented and auto-registered coordinator plugins for SQLite, JournalFS, and PostgreSQL, with robust matching and configuration logic. (terratorch_iterate/iterate2/plugin/coordinator/sqlite.py, journalfs.py, postgresql.py) [1] [2] [3]
Added dependency management for PostgreSQL coordinator via a new postgresql extra in pyproject.toml, using psycopg2-binary.
Added a new example script for running HPO trials with LSF and PostgreSQL as the Optuna backend. (examples/run_lsf_gridfm_example_postgres.sh)

2. HPO Configuration Improvements

Consolidated HPO resource selection into a new compute group hyperparameter in gridfm_graphkit_hpo.yaml, which co-selects gpu_num, num_workers, and batch_size for better scaling and usability. [1] [2]
Updated metrics to use "Validation loss" as the primary tracked metric.

3. Reliability and Usability Enhancements

Added logic to automatically re-queue a portion of failed trials (25%) for retry, improving robustness of HPO runs. (terratorch_iterate/iterate2/_iterate2.py)
Improved logging and error handling throughout the coordinator selection and storage initialization process. [1] [2] [3] [4]

These changes collectively make the HPO workflow more robust, scalable, and easier to configure for a variety of cluster and database setups.

Signed-off-by: Romeo Kienzler <[email protected]>

…ig/data_path

Signed-off-by: Romeo Kienzler <[email protected]>

romeokienzler added 7 commits April 20, 2026 15:29

add postgresql support, introduce coordinator plugins

27b4a79

Signed-off-by: Romeo Kienzler <[email protected]>

restore lsf gridfm postgres example script

aac4910

restore gridfm_graphkit_hpo.yaml with compute group

c10df93

fix: move iterate2.py into package to resolve module/package shadowing

d93f5c6

fix: remove -- prefix from STATIC_ARGS_JSON keys and deduplicate conf…

e232da8

…ig/data_path

fix hpo conf

17b8a15

Signed-off-by: Romeo Kienzler <[email protected]>

re enque failed trials

b6b3dbe

Signed-off-by: Romeo Kienzler <[email protected]>

romeokienzler merged commit 36e2ce3 into main Apr 24, 2026
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PostgreSQL support and restore related configurations#59

Add PostgreSQL support and restore related configurations#59
romeokienzler merged 7 commits intomainfrom
add_postgres_example

romeokienzler commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

romeokienzler commented Apr 24, 2026

1. Coordinator Plugin System and PostgreSQL Support

2. HPO Configuration Improvements

3. Reliability and Usability Enhancements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant